aei_no_std <- read.csv("/Volumes/RachelExternal/Thesis/Data_upload_for_CL/AEI_NoStd.csv") #some data
source("/Volumes/RachelExternal/Thesis/Thesis/Thesis_Functions.R") #some functions
Histograms
We’ve got a lot of things here to log, scale or center.
What do I mean by scaling or centering?
Scaling: Shifting the range of the predictor between [0,1] using the formula: \(\frac{y_i}{max(y)}\)
Centering: Centering the mean of the predictor on 0 or 1 using the formula: \(\frac{y_i-mean(y)}{sd(y)}\)
Why must I do this?
Well, a couple of reasons. First and foremost it makes the specification of priors easier, as the distribution of the parameter already centered around the mean and most of the data points are contained within one standard deviation to each side. Another reason is that it makes the interpretation of the coeficients a bit easier, as you can clearly tell which have positive or negative effects.
Lets plot what these look like with out being transformed. These graphs are also interactive, feel free to click around.
Main Predictors
Some things of note here: I will be using income instead of total GDP, median Humidity and PET instead of Average, and Humidity with the Inf/NaN values replaced.
A lot of things here are skewed left, which is to be expected as majority of countries are smaller rather than bigger in all aspects. There is little noticeable difference between the regions in many of the predictor variables. Income has trends as you would expect, with Europe having higher incomes and Sub Saharan Africa having lower, with some of the high outliers being from North Africa and the Middle East.There are bigger regional differences in Humidity and PET (remember these two are related \(Humidity = Precip/PET\)). Precip looks similar to Humidity (again, similarities were expected). Ruggedness is tough to find regional trends visually.
Crop Fractions
What about crop fractions?
Standardization
A lot of these predictors I assume accumulate exponentially. Population, Income and total GDP are obvious ones. Some others seem to be exponential as well, given the distrobution of their histograms. Precipitation seems to accumulate exponentially, as well as most of the crop fractions.
- area_km -
- population - Log and Center
- income - Log and Center
- GDP - Log and Center
- Median Humidity - Center
- Median PET - Center
- Precip - Log and Center
- Ruggedness - Scale
aei_std <-
aei_no_std %>%
select(-c(20, 21,24,28)) %>%
mutate(across(c(17:19, 23), log)) %>%
mutate(across(c(8, 17:19, 20:23, 25:51), scale, scale = TRUE)) %>%
mutate(across(c(24), normalized))
summary(aei_std)
## X ISO year country
## Min. : 1.0 Length:3003 Min. :1910 Length:3003
## 1st Qu.: 751.5 Class :character 1st Qu.:1940 Class :character
## Median :1502.0 Mode :character Median :1970 Mode :character
## Mean :1502.0 Mean :1964
## 3rd Qu.:2252.5 3rd Qu.:1990
## Max. :3003.0 Max. :2005
##
## ID aei_ha yearcount area_km.V1
## Min. : 1.0 Min. : 0 Min. : 0.00 Min. :-0.34652
## 1st Qu.: 62.0 1st Qu.: 0 1st Qu.:30.00 1st Qu.:-0.33996
## Median :125.0 Median : 12852 Median :60.00 Median :-0.29198
## Mean :125.5 Mean : 804385 Mean :54.23 Mean : 0.00000
## 3rd Qu.:187.0 3rd Qu.: 200700 3rd Qu.:80.00 3rd Qu.:-0.09049
## Max. :400.0 Max. :64646000 Max. :95.00 Max. : 8.91977
## NA's :260
## irrperc irrfrac four_regions eight_regions
## Min. : 0.00000 Min. :0.00000 Length:3003 Length:3003
## 1st Qu.: 0.00586 1st Qu.:0.00006 Class :character Class :character
## Median : 0.21376 Median :0.00214 Mode :character Mode :character
## Mean : 1.67096 Mean :0.01671
## 3rd Qu.: 1.53916 3rd Qu.:0.01539
## Max. :37.41152 Max. :0.37412
## NA's :260 NA's :260
## six_regions Latitude Longitude World.bank.region
## Length:3003 Min. :-42.00 Min. :-175.00 Length:3003
## Class :character 1st Qu.: 4.00 1st Qu.: -9.50 Class :character
## Mode :character Median : 17.05 Median : 20.00 Mode :character
## Mean : 18.81 Mean : 20.06
## 3rd Qu.: 39.75 3rd Qu.: 48.00
## Max. : 65.00 Max. : 179.14
## NA's :650 NA's :650
## population.V1 income.V1 GDPtot.V1 medHumid.V1
## Min. :-3.0977 Min. :-2.0902 Min. :-3.2642 Min. :-1.23445
## 1st Qu.:-0.6125 1st Qu.:-0.7890 1st Qu.:-0.6816 1st Qu.:-0.78071
## Median : 0.1513 Median :-0.1326 Median : 0.0615 Median :-0.06354
## Mean : 0.0000 Mean : 0.0000 Mean : 0.0000 Mean : 0.00000
## 3rd Qu.: 0.6386 3rd Qu.: 0.6983 3rd Qu.: 0.6851 3rd Qu.: 0.54192
## Max. : 2.7152 Max. : 3.1518 Max. : 2.7272 Max. : 8.91519
## NA's :650 NA's :663 NA's :663 NA's :143
## medHumid2.V1 medPET.V1 cubM_precip.V1 rugged
## Min. :-1.23054 Min. :-2.75410 Min. :-4.04948 Min. :0.00000
## 1st Qu.:-0.78834 1st Qu.:-0.67067 1st Qu.:-0.67945 1st Qu.:0.04519
## Median :-0.06774 Median : 0.26460 Median : 0.02136 Median :0.12290
## Mean : 0.00000 Mean : 0.00000 Mean : 0.00000 Mean :0.17603
## 3rd Qu.: 0.52698 3rd Qu.: 0.69194 3rd Qu.: 0.69510 3rd Qu.:0.25451
## Max. : 8.66093 Max. : 2.10413 Max. : 2.65292 Max. :1.00000
## NA's :143 NA's :143 NA's :143 NA's :52
## Temperate_cereals.V1 Rice.V1 Maize.V1 Tropical_cereals.V1
## Min. :-0.71529 Min. :-0.52190 Min. :-0.72777 Min. :-0.34949
## 1st Qu.:-0.61383 1st Qu.:-0.52190 1st Qu.:-0.64338 1st Qu.:-0.34040
## Median :-0.17326 Median :-0.41011 Median :-0.16466 Median :-0.20449
## Mean : 0.00000 Mean : 0.00000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.: 0.18603 3rd Qu.: 0.15985 3rd Qu.: 0.20335 3rd Qu.: 0.00788
## Max. :12.80401 Max. :13.21499 Max. :16.02787 Max. :26.13316
## NA's :143 NA's :143 NA's :143 NA's :143
## Pulses.V1 Temperate_roots.V1 Tropical_roots.V1 Sunflower.V1
## Min. :-0.48586 Min. :-0.28270 Min. :-0.43966 Min. :-0.49024
## 1st Qu.:-0.43492 1st Qu.:-0.28270 1st Qu.:-0.43966 1st Qu.:-0.49024
## Median :-0.18152 Median :-0.24262 Median :-0.25750 Median :-0.29274
## Mean : 0.00000 Mean : 0.00000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.: 0.08921 3rd Qu.:-0.00056 3rd Qu.: 0.08321 3rd Qu.: 0.08903
## Max. :18.45318 Max. :30.30769 Max. :25.09540 Max. :18.13688
## NA's :143 NA's :143 NA's :143 NA's :143
## Soybean.V1 Groundnuts.V1 Rapeseed.V1 Sugarcane.V1
## Min. :-0.44055 Min. :-0.42903 Min. :-0.32214 Min. :-0.16489
## 1st Qu.:-0.44055 1st Qu.:-0.42903 1st Qu.:-0.32214 1st Qu.:-0.16489
## Median :-0.33527 Median :-0.24285 Median :-0.20053 Median :-0.12973
## Mean : 0.00000 Mean : 0.00000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.: 0.07736 3rd Qu.: 0.09089 3rd Qu.: 0.00380 3rd Qu.:-0.02986
## Max. :18.83110 Max. :28.73295 Max. :24.24173 Max. :37.48230
## NA's :143 NA's :143 NA's :143 NA's :143
## Others.V1 Managed_Grasslands.V1 Temperate_cereals.1.V1
## Min. :-0.86416 Min. :-1.14803 Min. :-0.24269
## 1st Qu.:-0.61199 1st Qu.:-0.48318 1st Qu.:-0.24269
## Median :-0.10841 Median :-0.13424 Median :-0.19144
## Mean : 0.00000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.: 0.23593 3rd Qu.: 0.27012 3rd Qu.:-0.03013
## Max. :11.81541 Max. : 9.41777 Max. :23.46723
## NA's :143 NA's :143 NA's :143
## Rice.1.V1 Maize.1.V1 Tropical_cereals.1.V1 Pulses.1.V1
## Min. :-0.33405 Min. :-0.37124 Min. :-0.24242 Min. :-0.21988
## 1st Qu.:-0.33405 1st Qu.:-0.37124 1st Qu.:-0.24242 1st Qu.:-0.21988
## Median :-0.29997 Median :-0.22663 Median :-0.24242 Median :-0.19206
## Mean : 0.00000 Mean : 0.00000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.: 0.00570 3rd Qu.: 0.03389 3rd Qu.:-0.07847 3rd Qu.:-0.05356
## Max. :29.12739 Max. :20.65714 Max. :19.64764 Max. :26.12341
## NA's :143 NA's :143 NA's :143 NA's :143
## Temperate_roots.1.V1 Sunflower.1.V1 Soybean.1.V1 Groundnuts.1.V1
## Min. :-0.26220 Min. :-0.14977 Min. :-0.23353 Min. :-0.25187
## 1st Qu.:-0.26220 1st Qu.:-0.14977 1st Qu.:-0.23353 1st Qu.:-0.25187
## Median :-0.26220 Median :-0.14977 Median :-0.23353 Median :-0.25187
## Mean : 0.00000 Mean : 0.00000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.:-0.07223 3rd Qu.:-0.06512 3rd Qu.:-0.04155 3rd Qu.:-0.04506
## Max. :19.86639 Max. :30.77447 Max. :37.37319 Max. :37.77301
## NA's :143 NA's :143 NA's :143 NA's :143
## Rapeseed.1.V1 Sugarcane.1.V1 Others.1.V1
## Min. :-0.17788 Min. :-0.29669 Min. :-0.31794
## 1st Qu.:-0.17788 1st Qu.:-0.29669 1st Qu.:-0.30152
## Median :-0.17788 Median :-0.26895 Median :-0.15380
## Mean : 0.00000 Mean : 0.00000 Mean : 0.00000
## 3rd Qu.:-0.15152 3rd Qu.:-0.05969 3rd Qu.: 0.03691
## Max. :26.12795 Max. :16.30080 Max. :39.03201
## NA's :143 NA's :143 NA's :143
## Managed_Grasslands.1.V1
## Min. :-0.28247
## 1st Qu.:-0.28247
## Median :-0.21928
## Mean : 0.00000
## 3rd Qu.:-0.03580
## Max. :25.22847
## NA's :143